Parallel, multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters

نویسندگان

  • James R. McCombs
  • Andreas Stathopoulos
چکیده

Parallel iterative solvers are often the only means of solving large linear systems and eigenproblems. However, these solvers are usually implemented in a fine grain manner and, when scaled to large numbers of processors on MPP’s, can incur significant performance penalties due to synchronization overheads. This problem is exacerbated in clusters of workstations (COWs) and SMPs that are interconnected via a hierarchy of commodity networking components using standard communication protocols. Because overheads in MPPs and LAN technologies have not improved nearly as much as network bandwidth in recent years, there is a need for innovative parallel implementations of scientific applications that are capable of hiding overheads. In this paper, we describe a novel scheme for improving the scalability of a particular class of numerical algorithms, specifically, by hiding the overheads of block iterative solvers that employ flexible preconditioning through an inner iterative method. Block methods are not only robust in the presence of eigenvalue multiplicities and multiple righthand sides, but provide better latency tolerance by performing more floating-point operations between synchronizations. We take a different approach to inducing latency tolerance by increasing the granularity at which the preconditioning is performed for each block vector. This is accomplished by splitting the processors into smaller subgroups which are then used to precondition each block vector concurrently. The rest of the algorithm is still performed in fine-grain. We call this combination of fine and coarse-grain parallelism multigrain. To test the effectiveness of the multigrain parallelism, we implemented a multigrain, block JacobiDavidson algorithm for computing a few extreme eigenvalues of a symmetric matrix. We obtained improvements of 45-50% over both the block and non-block implementations of the fine-grain method when testing on an IBM SP and on a collection of clusters consisting of Sun workstations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multigrain Parallelism for Eigenvalue Computations on Networks of Clusters

Clusters of workstations have become a cost-effective means of performing scientific computations. However, large network latencies, resource sharing, and heterogeneity found in networks of clusters and Grids can impede the performance of applications not specifically tailored for use in such environments. A typical example is the traditional fine grain implementations of Krylov-like iterative ...

متن کامل

Dynamic Load Balancing of an Iterative Eigensolver on Networks of Heterogeneous Clusters

Clusters of homogeneous workstations built around fast networks have become popular means of solving scientific problems, and users often have access to several such clusters. Harnessing the collective power of these clusters to solve a single, challenging problem is desirable, but is often impeded by large inter-cluster network latencies and heterogeneity of different clusters. The complexity ...

متن کامل

Challenging Applications on Fast Networks

Parallel computing on clusters of workstations is attractive because of the low costs in comparison to MPPs, but the speed of the local area network limits the class of applications that can be run efficiently. Fortunately, faster network technology is becoming available for the next generation of workstation clusters. This paper studies the effect of running challenging applications that commu...

متن کامل

Network Related Performance Issues and Techniques for Mpps Network Related Performance Issues and Techniques for Mpps

In this paper we review network related performance issues for current Massively Parallel Processors (MPPs) in the context of some important basic operations in scientiic and engineering computation. The communication system is one of the most performance critical architectural components of MPPs. In particular, understanding the demand posed by collective communication is critical in architect...

متن کامل

MPPs versus Clusters

In coming years, if not already, the parallel-processing community can expect to hear regularly from MPP advocates and cluster advocates about why their approach is better. Either pitch is apt to be a hard sell: hard to sell to an informed audience or reader, and dull. The attempt to distinguish between MPPs and clusters is in some cases an empty subject. By the term “cluster,” I mean a group o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Parallel Computing

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2003